New ridge parameter estimators for the quasi-Poisson ridge regression model

The quasi-Poisson regression model is used for count data and is preferred over the Poisson regression model in the case of over-dispersed count data. The quasi-likelihood estimator is used to estimate the regression coefficients of the quasi-Poisson regression model. The quasi-likelihood estimator gives sub-optimal estimates if regressors are highly correlated—multicollinearity issue. Biased estimation methods are often used to overcome the multicollinearity issue in the regression model. In this study, we explore the ridge estimator for the quasi-Poisson regression model to mitigate the multicollinearity issue. Furthermore, we propose various ridge parameter estimators for this model. We derive the theoretical properties of the ridge estimator and compare its performance with the quasi-likelihood estimator in terms of matrix and scalar mean squared error. We further compared the proposed estimator numerically through a Monte Carlo simulation study and a real-life application. We found that both the simulation and application results show the superiority of the ridge estimator, particularly with the best ridge parameter estimator, over the quasi-likelihood estimator in the presence of multicollinearity issue.


New ridge parameter estimators for the quasi-Poisson ridge regression model
Aamir Shahzad 1 , Muhammad Amin 1 , Walid Emam 2* & Muhammad Faisal 3 The quasi-Poisson regression model is used for count data and is preferred over the Poisson regression model in the case of over-dispersed count data.The quasi-likelihood estimator is used to estimate the regression coefficients of the quasi-Poisson regression model.The quasi-likelihood estimator gives sub-optimal estimates if regressors are highly correlated-multicollinearity issue.Biased estimation methods are often used to overcome the multicollinearity issue in the regression model.In this study, we explore the ridge estimator for the quasi-Poisson regression model to mitigate the multicollinearity issue.Furthermore, we propose various ridge parameter estimators for this model.We derive the theoretical properties of the ridge estimator and compare its performance with the quasi-likelihood estimator in terms of matrix and scalar mean squared error.We further compared the proposed estimator numerically through a Monte Carlo simulation study and a real-life application.We found that both the simulation and application results show the superiority of the ridge estimator, particularly with the best ridge parameter estimator, over the quasi-likelihood estimator in the presence of multicollinearity issue.
Regression analysis is widely used for predicting or identifying the factors associated with the response variable.Several types of regression models are developed that depend on the distribution of the response variable, and the type of relationship (linear vs non-linear) to measure the average relationship between the response variable and one or more explanatory variable variables such as the generalized linear model (GLM) 1 and some other models.The GLM has various applications in science, engineering, business and some others 2,3 .
The count models are used to examine the factors that influence the response variable which is in positive integers such as 0, 1, 2, 3, 4…, etc. 4,5 .Several count regression type models are developed such as the Poisson regression model (PRM), Quasi-Poisson regression model (QPRM), negative binomial regression model (NBRM), bell regression model (BRM) and Conway Maxwell Poisson regression model (CMPRM).These models are used in various situations.The PRM 6 is used when the response variable has a Poisson distribution with identical average and dispersion 7 .
In real-life datasets, the assumption of equal mean and variance, as postulated by PRM, often is violated.Over-dispersion occurs when the variance of the response variable exceeds its mean.The NBRM is used to handle over-dispersion 8 .However, NBRM requires a large number of samples and the response variable must have positive numbers.This model is commonly applied to count data and over-dispersion scenarios 9 .Conway and Maxwell introduced the Conway-Maxwell PRM (CMPRM) in 1962, adept at addressing both over and underdispersion in count data 10,11 .Additionally, the QPRM is a generalization of the PRM and is commonly used for modeling an over-dispersed count variable 12 .The QPRM is an alternative to the NBRM 13 and is recommended when the variance is a linear function of the mean 13 .
The quasi-likelihood estimator (QLE) is used to estimate the regression coefficients of the QPRM 14 .However, it produces inefficient estimates when the regressors are highly correlated known as multicollinearity.This term was used by Frisch 15 for the first time in the linear regression model.When multicollinearity exists in a QPRM, the QLE produces large variances, leading to incorrectly signed regression coefficients and wider confidence intervals.This issue further results in inflated standard errors of regression coefficients, potentially leading to inaccurate outcomes 16 .
Biased estimation methods are commonly used to mitigate the impact of multicollinearity.Several biased estimation methods have been developed to diminish the impact of multicollinearity in producing more reliable estimates [17][18][19] .The ridge estimator (RE) is a popular method to deal with the issue of multicollinearity in where µ > 0, and γ is an over-dispersion parameter.The close relationship between the expectation and variance shows that variance is a function of its average.The QPRM is characterized by the first two moments (mean and variance) as discussed by Wedderburn 12 , but Efron 38 and Gelfand and Dalal 40 showed how to create a distribution for this model; however, it requires re-parameterization.Estimation often proceeds from the first two moments and estimating equations 14 .The likelihood function for the QPRM (quasi-likelihood) does not require a specific probability density function to estimate regression parameters except for the response variable assumption 41 .The formation of the QL function begins in the same way as the usual likelihood function.The QLE is obtained by minimizing the Quasi-log likelihood function that is given by Wakefield 42 Now differentiate (3) w.r.t.µ i , and equating to zero, we have the Quasi-score function where µ = exp(Xβ) .Here X is a covariates matrix of order n × (p + 1) and β is the column vector of regression coefficients of order (p + 1) × 1.
As Eq. ( 4) is non-linear in β , so, the regression parameters of the QP model can be calculated using iterative reweighted least square (IWLS).The equation used in calculating the IWLS with t + 1 iterations can be written as 13 where where The QLE of β at the final iteration is defined as The QLE is normally distributed with a covariance matrix that corresponds to the inverse of the matrix of the second derivatives: Furthermore, the MSE of the β QLE is given as By applying the trace on both sides of Eq. ( 8), we have where j is the jth eigenvalue of the matrix F .When the explanatory variables in the QPRM are highly corre- lated, then the weighted matrix of cross-products of F, is ill-conditioned, and the QLE gives inefficient results with larger variances.In this condition, it is difficult to interpret the estimated coefficients since the vector of parameters is on average too large.

The quasi Poisson ridge regression estimator
When the multicollinearity is present among the explanatory variables in the QPRM, then the QLE does not perform efficiently, it gives a large variance of estimated coefficients.To minimize the effects of multicollinearity, the RR estimation method was introduced by Hoerl and Kennard 20 .In this study, we are proposing the quasi-Poisson ridge regression estimator (QPRRE) applied to the count and over-dispersed data that minimizes the effects of multicollinearity.So, the QPRRE is defined by where F = X ′ WX and k is the biasing parameter and I p+1 is the identity matrix with order (p + 1) × (p + 1) .The bias, the covariance and the matrix MSE (MMSE) of the β k can be derived as follows where � k = diag( 1 + k, 2 + k, . . ., p+1 + k) and � = diag( 1 , 2 , 3 , . . ., p+1 ) = T(F)T ′ , where the orthogo- nal matrix T has eigenvectors of F .Finally, the scalar MSE of the QPRRE can be estimated by applying trace on Eq. ( 13), which can be defined as where α j = T ′ β QLE and j is the jth eigenvalue of the F matrix.

The superiority of the QPRRE to the QLE
To explore the superiority of the RR estimator over others, Hoerl and Kennard 20 proposed the statements for the properties of the MSE of the RR estimators in the LRM.Here, we will prove that the theorems also hold for the QPRM and according to these theorems, we will explore the supremacy of the QPRRE over the QLE.

Theorem 3.1
The variance M 1 (k) and squared bias M 2 (k) are respectively continues, monotonically decreasing and increasing functions of k, since k > 0 and j > 0.
Proof The 1st derivative of M 1 (k) and M 2 (k) from Eq. ( 14) concerning k are.
The γ , k, j and α j 2 are positive, Eq. (15) shows that the M 1 (k) is a continuous and monotonically decreasing function of k, since k > 0 and j > 0 .Equation (16) shows that the M 2 (k) is a continuous and monotonically increasing function of k.Theorem 3.2 There always exists a k > 0 and the MSE β k < MSE β QLE .
Proof The 1st derivative of Eq. ( 14) concerning k is given by Equation (17), clearly shows that the sufficient condition for

Selection of the biasing parameters
The RR estimator is based on the RPE which has a main role in its estimation.To minimize the effects of high correlation among the explanatory variables, the optimal value of the shrinkage parameter (k) is the main concern of the RR.The RPEs for different regression models have been suggested by many investigators and find the optimal biasing parameter 20,24,31,37,43,44 .Firstly, Hoerl and Kennard 20 presented the ridge estimation method to mitigate the effect of a high degree of correlation for the LRM.This estimator was also used for the gamma regression model (GRM) 44 , and for the CMPRM 37 .For the QPRRE, it is defined as where γ is the estimated dispersion parameter.
Hoerl et al. 24 proposed the shrinkage parameter estimator for the RR in the LRM and we are adapting this estimator for the QPRRE as Amin et al. 16,44 developed an RPE for the inverse Gaussian ridge regression (IGRR) Akram et al. 45 proposed the following RPEs of the GRM's RR.

Simulation layout
In the QPRM, the response variable is generated from the quasi-Poisson distribution (µ i , γ ) , where where β p shows the regression parameters of the QPRM.These parameters are selected under the condition of p+1 j=1 β 2 j = 1 .And, the following formula is used to generate the correlated regressors.
where ρ 2 shows the correlation between the regressors and z ij are the independent standard normal pseudo- random numbers.We consider different values of ρ corresponding to 0.80,0.90,0.95 and 0.99.We also consider different values of n, p, γ .Here, n represents the sample size that is assumed to be 25, 50, 100, 150, 200 for p = 3, 6 and n = 50, 100, 150, 200 for p = 12, p shows the number of regressors that are assumed to be 3, 6, 12 and γ indicates the dispersion parameter that is taken to be 2, 4, 6.This simulation is replicated 2000 times with the different combinations of n, p, γ , ρ .To check the dominance of our proposed ridge estimator with different RPEs, we use the MSE as the performance evaluation method defined by where V represents the number of replications and β i − β shows the difference between the true parameter and predicted vectors of the proposed estimator and QLE at ith replication.The R programming language is used for all calculations related to our study.

Simulation results and discussions
The estimated MSEs of the QPRREs are given in Tables 1, 2  21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 are summarized as follows: (i) The basic purpose of this simulation study is to study the performance of our proposed RPEs for the QPRRE in the presence of multicollinearity.As the multicollinearity level increases with fixed the number of predictors, the sample size, and dispersion, the estimated MSEs increase for all the estimation methods under study.However, at the high level of multicollinearity and with a larger sample size, mostly the QPRRE with RPEs k 3 , k 5 , k 9 , k 11 , k 12 and k 16 gives the smaller MSEs as compared to the QPRRE with other RPEs and the QLE.(ii) The other factor that may affect the estimated MSEs of the QPRRE and the QLE is the number of regressors.According to the results of the simulation, we noticed that the estimated MSEs of the estimators and the number of regressors are directly proportional; it means that when the regressors increase by fixing the other factors, the estimated MSEs of QPRRE and the QLE also increase.(iii) When we see the effect of sample size on the MSEs of the estimators, it is observed that the relation between the estimated MSEs and the sample size is inverse.As the sample size increases by fixing the other factors

Application: apprentice migration dataset
In this section, we explore the superiority of proposed estimators through a real-life example.We use this dataset 46 to determine the performance of the proposed estimators.This real-life dataset is about apprentice migration between 1775 and 1799 to Edinburgh from Scotland.The dataset consists of a sample size of 33 ( n = 33) with one explained variable and 4 predictors p = 4 .The explained variable y denotes the apprentice and the independent variable x 1 represents the distance, x 2 represents the population, x 3 represents the degree of urbanization and the x 4 represents the direction from Edinburgh.The fitness of the QP distribution is determined by the estimated value of dispersion and the value of dispersion for this real-life data set was found to be 9651.93.By considering the dispersion value, it can be seen that this data of the concerned application is over-dispersed, so QPRM is more appropriate than the PRM.As there are four predictors, there may be a chance of multicollinearity.To test multicollinearity among predictors, we consider the most popular criteria i.e. the condition index (CI).The CI is the square root of the ratio of minimum eigenvalue and maximum eigenvalue of the independent variables matrix.The CI value is 63.81 which is greater than 30, this shows that severe multicollinearity exists among the independent variables.www.nature.com/scientificreports/ The QPRM estimates are obtained by using the QLE.The QLE can give better results if the predictors are uncorrelated.In this case, when the predictors are highly correlated with each other, so, the QLE does not provide good estimates.So, the QPRRE is considered to overcome the effect of multicollinearity.Table 37 shows the estimated coefficients and scalar estimated MSEs of the QLE and QPRRE under proposed RPEs.The QPRM estimates using the QLE and QPREE are obtained using Eqs.( 6) and ( 10) respectively.The estimated scalar MSEs of the QLE and the QPRRE are obtained by using Eqs.( 9) and ( 14) respectively.According to the results of Table 37, it is observed that the MSE of the QPRRE with RPEs is less than the MSE of the QLE.It means that the QPRRE performs well and gives the best results than the QLE.More specifically, the performance of the QPRRE with RPE k 7 is best as compared to the QPRRE with other RPEs and the QLE based on minimum MSE.
In real datasets, sometimes the MSE criteria do not give good predictive performance of the estimators [47][48][49] .So, another model assessment criterion is recommended is the cross-validation (CV).The CV criteria are also known as the prediction sum of squares (PRESS/CV(1)) or a jackknife fit at given explanatory variables 50 .This criterion has also some limitations and different types 51 .The CV was considered by various authors for different models to assess the performance of their proposed estimators [47][48][49]52,53 . Here e consider the kfold CV and PRESS criterion for further evaluation of the proposed RPEs in the QPRRE.The procedure to compute the CV,  52 and Amin et al. 53 .While the PRESS is computed based on Pearson residuals for the QLE and QPRRE respectively as , and h ii are the ith diagonal elements of the hat matrix computed for the QLE and

Conclusion
When the dependent variable is in count form and over-dispersed, then the QPRM can be used for modeling such types of response variables.In this study, we proposed different RPEs for the QPRRE to minimize the problems of multicollinearity among the explanatory variables.To determine the superiority of the proposed ridge estimators, we conduct the simulation study under different parametric conditions such as different sample sizes, different numbers of predictor variables, different dispersion levels and different degrees of multicollinearity.Furthermore, the evaluation of the performance of proposed ridge estimators is done by analyzing the real-life www.nature.com/scientificreports/dataset related to Apprentice migration data.According to the results of the simulation study and real-life dataset, it is observed that the QPRRE with some available and proposed RPEs outperforms as compared to the QLE in the presence of severe multicollinearity.The provided evidence showed that the QPRRE is a better estimation method than the QLE to combat the problem of multicollinearity among the explanatory variables for counts data with over-dispersion.

Table 1 .
Estimatedthe estimated MSEs decrease.Results show that mostly the QPRRE with RPEs k 3 , k 5 , k 9 , k 11 , k 12 and k 16 performs well as compared to the other RPEs and the QLE based on minimum estimated MSE. (iv) According to the results, in every situation such as the different combinations of multicollinearity, different sample size, different number of explanatory variables and different levels of dispersion, the proposed QPRRE with different RPEs outperforms the QLE based on the minimum MSE.(v) According to the findings of simulation for all the conditions under study, the estimated MSE of the QLE is always greater than all suggested RPEs for QPRRE.We also noticed that the suggested QPRRE is significantly decreasing the estimated MSE.Finally, it is concluded that the proposed RPEs for the QPRRE MSEs for γ = 2 , p = 3 and n = 25 .Bold value indicates minimum MSE.Vol.:(0123456789)ScientificReports | (2024) 14:8489 | https://doi.org/10.1038/s41598-023-50085-5www.nature.com/scientificreports/perform well and give better results than the QLE due to the minimum estimated MSE under certain conditions.Mostly, the QPRRE with RPEs k 3 , k 5 , k 9 , k 11 , k 12 and k 16 gives better results than the QPRRE with other RPEs.There are some situations, where the QPRRE with other RPEs i.e. k 4 − k 8 , and k 15 gives better performance than others.By the evidence of simulation results, the study that is reported in this work is that the QPRR outperforms the QLE in the presence of multicollinearity.So, we suggest the researchers use QPRRE with the best biasing parameters k 3 , k 5 , k 9 , k 11 , k 12 and k 16 that minimize the effects of the multicollinearity in QPRM due to their robust patterns.

Table 9 .
Estimated MSEs for γ = 4 , p = 3 and n = 150 .Bold value indicates minimum MSE.ki is the predicted response of the QPRRE under different RPEs and h kii are the diagonal elements of the hat matrix obtained for the QPRRE.The estimated results for CV and PRESS for the QLE and QPRRE with all RPEs are given in Table37.Based on CV results, it can be seen that the QPRRE with RPEs k 3 , k 9 −k 12 and k 16 gives a better performance as compared to the QPRRE with other RPEs as well as the QLE.When we look at the results of the PRESS criterion, we observed that the QPRRE with RPEs k 1 −k 8 show a better performance than others.In view of simulation and application findings and we suggest the researchers to mitigate the effects of multicollinearity in QPRM, use the QPRRE with RPEs k 3 −k 9 , k 11 , k 12 and k 16 .

Table 37 .
Estimated regression coefficients and MSEs of the selected estimators in the Apprentice Migration Dataset.Bold value indicates minimum MSE.